Week 3: Causes, Confounds, and Colliders

Good and bad controls

Four elemental confounds

finding backdoor paths

  1. Start at treatment (X)

  2. Look for any arrows coming INTO X

  3. Follow all possible paths to outcome (Y)

  4. A valid adjustment set blocks all backdoor paths

  5. But be careful not to control for colliders!

exercise

Go to dagitty.net and create this DAG:

exercise

  1. List all paths between Exercise and Health
  2. Identify which paths are backdoor paths
  3. Find all valid adjustment sets if we want to estimate the effect of Exercise on Health
  4. BONUS: What happens if we control for Motivation? Why?

exercise: simulate simple confounding

Copy this code from the slides or the class book (Simulation 1: Simple Confounding).

  1. Run the base simulation and observe results

  2. Modify the simulation parameters:

  • Change the strength of the confounding (modify the 0.5, 0.8, and 0.6 coefficients)
  • Change the sample size (N)
  • Add a true causal effect (modify Y calculation to include X)
  1. Answer these questions:
  • What happens to the bias in the naive estimate as you increase the strength of confounding?
  • How does sample size affect the precision of your estimates?
  • When does controlling for Z fail to recover the true causal effect?
Code
#number of sims
N = 1000
# Generate data
U <- rnorm(N)  # Unobserved confounder
X <- rnorm(N, mean = 0.5 * U)  # Treatment affected by U
Y <- rnorm(N, mean = 0.8 * U)  # Outcome affected by U
Z <- rnorm(N, mean = 0.6 * U)  # Observed variable that captures U

d <- data.frame(X, Y, Z)

# Fit models
flist1 <- alist(
  Y ~ dnorm(mu, sigma),
  mu <- a + bX*X,
  a ~ dnorm(0, .5),
  bX ~ dnorm(0, .25),
  sigma ~ dexp(1)
)

m32.1 <- quap(flist1, d)
precis(m32.1)
             mean         sd        5.5%      94.5%
a     -0.02554477 0.03869275 -0.08738326 0.03629372
bX     0.29078366 0.03536878  0.23425752 0.34730980
sigma  1.22711182 0.02741407  1.18329883 1.27092480
Code
# Fit models
flist2 <- alist(
  Y ~ dnorm(mu, sigma),
  mu <- a + bX*X +bZ*Z,
  a ~ dnorm(0, .5),
  bX ~ dnorm(0, .25),
  bZ ~ dnorm(0, .25),
  sigma ~ dexp(1)
)

m32.2 <- quap(flist2, d)
precis(m32.2)
              mean         sd        5.5%      94.5%
a     -0.008883186 0.03715079 -0.06825733 0.05049095
bX     0.232572852 0.03451483  0.17741149 0.28773421
bZ     0.307952523 0.03303311  0.25515924 0.36074580
sigma  1.176568843 0.02628656  1.13455785 1.21857984
Code
post.1 <- extract.samples(m32.1)
post.2 <- extract.samples(m32.2)

results_df = data.frame(naive = post.1$bX,
                        adjusted = post.2$bX)
results_df %>% 
  pivot_longer(everything()) %>% 
  ggplot(aes(x = value, fill = name)) +
  geom_density(alpha = .5) +
  geom_vline(aes(xintercept = 0), linetype = "dashed")

bad controls

“Bad controls” can create bias in three main ways:

  • Collider bias (as we saw in the previous exercise)
  • Precision parasites (reduce precision without addressing confounding)
  • Bias amplification (making existing bias worse)

Warning signs of bad controls:

  • Post-treatment variables
  • Variables affected by both treatment and outcome
  • Variables that don’t address actual confounding paths

exercise

Modify your code for this new simulation (precision parasite):

n = 100
# Z affects X but is not a confounder
Z <- rnorm(n)
X <- rnorm(n, mean = Z)
Y <- rnorm(n, mean = X)  # True effect of X on Y is 1

Test your previous models with these new data, using different sample sizes (n = 50, 100, 1000). For each sample size, compare:

  • Standard errors without controlling for Z
  • Standard errors when controlling for Z

How does sample size affect the impact of the precision parasite? Under what conditions is the precision loss most severe?

exercise

Modify your code for this new simulation (bias amplification):

n = 100
conf_strength = 1
# U is unmeasured confounder
U <- rnorm(n)
Z <- rnorm(n)
X <- rnorm(n, mean = Z + conf_strength * U)
Y <- rnorm(n, mean = conf_strength * U)  # No true effect of X

Compare different confounder strengths (0.5, 1, 2).

Questions: * What happens to the bias when you control for Z? * How does the strength of the confounding affect the amount of bias amplification? * Can you explain why this happens using the DAG?

exercise

Modify your code to create a scenario with both a precision parasite variable and a bias amplification variable.

Questions: * What happens to our estimates when we control for both variables? * Is it better to: * Control for neither * Control for just one (which one?) Control for both How can we use DAGs to decide which controls to include?